Guidelines

The Library of Congress updates their Recommended Formats Statement regularly. This is a helpful quick reference for selecting a format that is stable when there is an opportunity to choose. If converting data from a proprietary format to an open file format results in some data loss, consider saving both. For less established or proprietary formats, consider recording the type, version, and software used to generate and play the file—this can be included in the metadata or documentation.

These guidelines may also be considered during file format selection:
13. Acquire the highest quality version of media to use for preservation
34. For EPUBs, opt for core media types, as defined by the EPUB specification

If a publication is document-like, exporting and transforming the core intellectual components to an existing standard for full text publications e.g. to EPUB, TEI, or JATS/BITS XML is a robust approach. This includes publications that contain multimedia or remote content since these enhanced features can be managed more easily at scale when the rest of the publication is expressed in a standard form. Existing standards can be validated at scale, support both platform migration and preservation, and may steer enhanced features to be expressed more consistently to work with the document.

These guidelines may also be helpful when considering the export package for a linear publication:
3. Use existing standards for export formats
10. Identify and document the core intellectual components of a work
20. Ensure exports cover all core intellectual components

An excessive number of small metadata files or a complex folder hierarchy within an export package adds complexity to the workflow. Ideally, export processes consolidate metadata into one file per publication, and the folder and file structure are mostly flattened, predictable, and use a consistent naming convention. Metadata should be fully expressed within the metadata file, not via filenames and folder names, and should include references to the files being described so that they are easily connected. The complexity of a submitted information package has an impact on the ability of a preservation service to efficiently and quickly convert it to an archival information package. Reducing the number of separate metadata files and folders reduces processing time and can improve stability in the long term by simplifying migration either to a preservation system or to another platform. To the extent that the goal is an automated preservation workflow, the export packages should be consistent across publications.

See also:
22. Use an appropriate metadata serialization within the export package

In addition to the main text and embedded or supplemental media, other features or content such as annotations, high-quality versions of media, supporting data, a visual walk through with the author, and peer reviews may be considered integral to the work in some cases. If so, these resources should be part of the export package so that they can be preserved alongside the publication. Special provisions may need to be made for artifacts that are hosted outside of the platform to include them in the export.

See also:
10. Identify and document the core intellectual components of a work
72. Create a video walkthrough of any complex features

Each publication should have structured bibliographic metadata associated with it. This should be expressed as a separate file stored adjacent to or within the publication package. When possible, this should be expressed in a standard format such as ONIX, JATS, or Dublin Core. In order to process metadata at scale, the file naming convention, location of the file relative to the publication, and format should all remain consistent.

These guidelines may add context when deciding how to format bibliographic metadata and where to store it:
3. Use existing standards when creating metadata
22. Express metadata in an appropriate structured format
30. Add bibliographic metadata to an EPUB
45. Embed bibliographic metadata in a web page

When exporting metadata, ensure that the data format used to express it is appropriate for the content. For example, a CSV file will work for very simple metadata, but if the fields contain formatting, values that include new lines, or express specific data types, a CSV export could become unreliable or difficult to process. A structured format such as JSON or XML is generally more appropriate and can be validated for errors more easily.

Current publishing platforms can support frequent updates and new versions. These should be expressed clearly through the metadata so that the preserved copies can be properly distinguished from each other. If something has changed, it should be reflected in the version and date and where necessary, new exports should be provided.

These guidelines also relate to versioning:
9. Determine the version of record in you context
31. Assign new identifiers to significant versions of a work

Fulcrum has structured its export packages, which include EPUBs, to support preservation. The enhanced media viewers that are used in the online version of Fulcrum EPUBs will not work if the Fulcrum platform is no longer available. To help ensure the EPUBs will continue to have essential functionality over the long term, the export process simplifies these features. For photos, it embeds a static view of the photo inside the EPUB instead of depending on a IIIF viewer. For audio and video, it displays a DOI link to the media resource instead of retaining the enhanced media players for these features, since these will not work if Fulcrum is unavailable. Where the players were once embedded in the EPUB, instead a persistent DOI link is displayed to point to the current location of that resource. The export package also includes all media files, as well as a CSV registry that indicates which DOI points to which file, so that the linked file can be identified even if the DOI does not resolve. These features are all applied in a way that conforms to the EPUB 3 standard.

Many publication resources that are supported by modern publishing platforms warrant their own description to ensure they are properly credited, interpreted, and rendered with context in the future. Where possible, include descriptive metadata for each resource. Use an existing standard for guidance on what to include, e.g. Dublin Core. A publisher may be able to leverage data from an art log or author questionnaire to produce this metadata.

These guidelines add additional context to creating metadata for publication resources:
16. Captions for non-text features add meaningful context
22. Express metadata in an appropriate structured format
25. Express the license information in the resource-level metadata
26. Describe connections between resources in the metadata
27. Assign and use unique persistent identifiers for publication resources

When a publisher acquires rights for resources that are part of the publication, these should also include rights pertaining to the preservation of those resources. Express these rights in the metadata in a way that allows a preservation service to determine what they have permission to preserve and relate them to the relevant material.

These guidelines may also support the creation of license metadata:
8. Clarify the license related to preserving third party web resources
24. Create descriptive metadata for each publication resource
40. Embed license information in the HTML

While developing export processes, attention should be given to describing each resource in the package. If the relationships between the resources are also significant, ensure that this is expressed in the metadata as well. For example, if several data files are dependent on each other, or two items are versions of the same thing, or something should interact with the publication in a specific way, these relationships should be expressed so that they can remain connected in the preserved copy. Ask, what information is needed to restitch the seams between the resources in your package?

See also:
27. Assign persistent identifiers to publication resources, they can help perpetuate connections between resources

Correct handling of character encoding can make an enormous difference to whether a publication is properly rendered. Encoding type should be expressed in the metadata, and/or within the publication as appropriate for the format. For example, websites may include encoding in the metatags and/or the charset property of the HTTP headers.

Do not send administrative data to a preservation archive unless it is integral to the work. For example, when exporting a SQL database, you may need to exclude or anonymize the content from user tables, indexes that support a specific UI, non-public communications, or logs. Only archive data that can be made publicly accessible.

These guidelines refer to the creation of the installation package:
18. For linear publications, export packages should include core intellectual components of the work separated from the publishing platform and transformed to an existing full-text standard
20. Represent all core intellectual components of the work in the export package
61. Create installation packages for custom websites that don’t require a live server
62. Create installation packages for custom websites that do require a live server

As part of the research that led to Knowing Silence: How Children Talk about Immigration Status in School by Ariana Mangual Figueroa, teenagers were provided with iPod Touches to record audio and video conversations. The resulting audio files are considered core intellectual components by the author and were considered to be in scope for inclusion in the preservation export package. Due to the sensitive nature of the content of these files, and the need to maintain the privacy of these youth, University of Minnesota Press decided to include the edited, privacy supporting versions of the files in the export package. This example highlights an instance where privacy concerns set the stage for how to approach the export package.

For data, software, or any resource that has a complex arrangement of files, if structured metadata cannot be supplied, a common convention is to include a README file from the author. Written using a plain text file format, this should be a note to future users who wish to use the files. It should include information such as, scope, purpose, author(s), relevant dates, license for reuse, dependencies, field names/descriptions, and instructions for use.

See also:
68. Provide documentation for software

Consider what a future user of the software might need to know to run the software and understand how it should work. Ensure this is covered by the documentation. For example, what is the software for? What are the supported operating systems and versions? Are there any dependencies or requirements? How do you install it? How do you use it? What should it do if it is working? What is its license? In the case where software is not possible to preserve, visual and narrative documentation of the user experience can provide vital context.

This guideline refers to another common method for documenting software:
66. Use a README file to document data or software

For publications where some content should not be preserved, consider tagging what can be preserved in a consistent way that can be used by preservation export or harvesting processes to exclude items that should not be preserved. Platforms may want to facilitate this tagging.

These guidelines also concern the inclusion and exclusion of content in the preservation process:
10. Define and document core intellectual components that need to be preserved
20. Represent all core intellectual components of the work in the export package
40. Identify the rights for external web content
55. Consider whether it is ethical/appropriate to preserve social media
65. Ensure irrelevant or private administrative data is removed from data exports

To achieve a shared understanding between the publisher, authors, and preservation service about what can be preserved so that authors can make informed decisions about what enhancements to include in their publication, broadly describe preservation approaches for different types of content added to a platform. This documentation could indicate to authors, for example, that they should have appropriate rights to files uploaded into the system and that they will be shared with a preservation service. It might also define a platform’s approach to third-party content in iframes by stating that content in iframes may not be preserved or maintained. Alternatively, it could instruct authors that all content in iframes will be archived, so iframes should only be used if the content in them is owned by the author or they have rights that allow it to be harvested by a preservation service. Information about a platform-level approach can be incorporated into or connected to a Terms of Use document, or could be in the form of a publicly visible preservation policy.

See also:
6. Keep preservation partners informed of changes
10. Define and document the core intellectual components of a work

Complex and interactive features of a publication may be most vulnerable to change or loss over time, especially if they also have third-party dependencies. Where there is a layout or interactivity that is important to understanding the work, record a video walk through with the author that shows the original intent so that it can be viewed or even recreated in the future. Include this recording in the preserved version of the publication.

See also:
11. If adding a video to the preservation package, consider the format
20. Represent all core intellectual components of the work in the export package
71. Document and share the platform-level approach to preserving components of a publication

Stanford University Press publishes immersive digital monographs that feature non-linear navigation and innovative design. As part of their preservation strategy, they include a video walkthrough for each publication as documentation. This video serves both as a way to introduce readers to an unfamiliar experience and also as a record in the event that features stop functioning in the future. Here is a video Walkthrough for Stephen Robertson’s Harlem in Disorder.